Skip to content

ref_time_ticks: normalize to nanoseconds on every platform#2685

Merged
borisbat merged 3 commits into
masterfrom
bbatkin/ref-time-ticks-normalize-ns
May 16, 2026
Merged

ref_time_ticks: normalize to nanoseconds on every platform#2685
borisbat merged 3 commits into
masterfrom
bbatkin/ref-time-ticks-normalize-ns

Conversation

@borisbat
Copy link
Copy Markdown
Collaborator

Summary

ref_time_ticks() now returns CLOCK_MONOTONIC-style nanoseconds on every platform. Previously it returned raw QueryPerformanceCounter ticks (~10 MHz typical) on Windows but clock_gettime nanoseconds on Linux/macOS. That unit mismatch silently broke the natural deadline pattern:

let deadline = ref_time_ticks() + int64(timeout_sec * 1_000_000)
while (ref_time_ticks() < deadline) { ... }

— 30 s on Windows (lucky math at 10 MHz), 30 ms on POSIX (1000× too short). Surfaced as a CI hang in dasImgui's playwright harness; deadlines elapsed instantly on Linux/macOS runners while Windows happened to land in the right ballpark.

What changed

  • Windows ref_time_ticks() divides QPC by the cached QueryPerformanceFrequency and returns nanoseconds. Conversion uses a split whole + remainder * 1e9 / freq fold so the intermediate never overflows int64.
  • qpc_freq() caches the QPF static — QPF is invariant after boot, and the race is benign (parallel initialisers compute the same value).
  • get_time_usec, get_time_nsec, ref_time_delta_to_usec collapse to trivial sub/div now that the unit is uniform.
  • Fixed a long-standing typo in the old Windows ref_time_delta_to_usec (it called QueryPerformanceCounter(&freq) instead of QueryPerformanceFrequency, so it was returning garbage).
  • POSIX paths unchanged semantically, just tidied for symmetry.
  • Doc comment at the top explains the unification + steers future callers toward get_time_usec(start) / get_time_nsec(start).

Callers

All in-tree callers — C++ (ast_parse, ast_simulate, runtime_profile, builtins) and .das (dastest, daslib/profiler, strudel, examples, MCP tools) — already use the safe let t0 = ref_time_ticks() + get_time_usec(t0) pattern, so no caller-side changes are required. The fix is purely in src/hal/performance_time.cpp.

Test plan

  • Builds clean on Windows (MSVC, Release)
  • Local smoke: sleep(1000ms) after ref_time_ticks()get_time_usec reports ~1_000_000, get_time_nsec ~1_000_000_000, raw delta in ns ballpark
  • CI matrix (Linux/macOS unchanged semantically; Windows now matches POSIX)

🤖 Generated with Claude Code

Until this commit, ref_time_ticks() returned raw QueryPerformanceCounter
ticks on Windows (~10 MHz) and clock_gettime nanoseconds on Linux/macOS.
Any caller that did raw arithmetic on the result -- the natural

    let deadline = ref_time_ticks() + int64(timeout_sec * 1_000_000)
    while (ref_time_ticks() < deadline) { ... }

deadline pattern -- silently got 30 s on Windows (lucky math at 10 MHz)
and 30 ms on POSIX (1000x too short). Recently surfaced as a CI hang in
dasImgui's playwright harness, which read deadlines that elapsed instantly
on Linux/macOS runners.

Normalize ref_time_ticks() to nanoseconds on Windows by dividing the QPC
counter by the cached QueryPerformanceFrequency. The conversion uses a
split whole+remainder fold so the intermediate never overflows int64.
QPF is cached once per process (invariant after boot, race-tolerant).

Helpers (get_time_usec, get_time_nsec, ref_time_delta_to_usec) become
trivial subtraction/division now that the unit is uniform; the Windows
ref_time_delta_to_usec also loses a long-standing QPC-for-QPF typo that
made it return garbage in the old implementation. The POSIX helpers are
unchanged in semantics, just tidied for symmetry.

All in-tree callers (C++ and .das, including dastest, daslib/profiler,
strudel, examples, MCP tools) already use the safe `let t0 = ref_time_ticks()`
+ `get_time_usec(t0)` pattern, so no caller changes are needed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 16, 2026 10:19
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Normalizes ref_time_ticks() to return CLOCK_MONOTONIC-style nanoseconds on Windows, matching the existing Linux/macOS behavior. Previously the Windows implementation returned raw QueryPerformanceCounter ticks (~10 MHz typical), causing silently wrong arithmetic in deadline patterns shared between POSIX and Windows. Also fixes a long-standing typo (QueryPerformanceCounter(&freq) instead of QueryPerformanceFrequency) in the old ref_time_delta_to_usec.

Changes:

  • Windows ref_time_ticks() now converts QPC ticks to ns using a split whole + remainder fold to avoid int64 overflow, with a cached QueryPerformanceFrequency value via a new qpc_freq() helper.
  • get_time_usec, get_time_nsec, and ref_time_delta_to_usec collapse to trivial subtraction/division on all platforms now that units are uniform; the buggy QPC-instead-of-QPF call is removed.
  • Adds a top-of-file doc comment explaining the unification and steering callers toward get_time_usec/get_time_nsec wrappers.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

borisbat and others added 2 commits May 16, 2026 06:48
The previous commit's split conversion (whole+rem with two idiv per call)
doubled the cost of ref_time_ticks() from ~22 ns to ~40 ns on modern
Windows. That matters for the function profiler, which brackets every
profiled call: ~36 ns of profiler skew per function vs ~14 ns before.

Precompute `qpc_ns_per_tick = 1e9 / freq` once when QPF divides 1e9
cleanly. On every Win 7+ box that's the case (QPF is fixed at 10 MHz,
ns_per_tick = 100), so the hot path collapses to one multiply and the
call returns to ~23 ns -- within 1 ns of the bare QueryPerformanceCounter
cost.

The split fallback stays for paranoid completeness on non-divisible QPF
(theoretical; not observed on shipping Windows hardware in years).

Microbench on this box (QPF=10MHz, MSVC /O2, 50M iterations):

  ref_old      (raw QPC, returns ticks)       22.3 ns/call    1.00x
  ref_new_split (whole+rem -> ns, previous)   39.8 ns/call    1.79x
  ref_new_fast  (ticks * ns_per_tick)         23.0 ns/call    1.03x

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Update the handmade RST blurb for builtin::ref_time_ticks to reflect the
new contract: monotonic timestamp in nanoseconds, raw subtraction valid
since the unit is uniform across Windows/Linux/macOS. The previous wording
described the return value as opaque "ticks", which was platform-dependent
and led to caller-side deadline-math bugs (the dasImgui CI hang).

Sphinx clean build succeeded, zero new warnings.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@borisbat borisbat merged commit e691abe into master May 16, 2026
28 checks passed
Copilot AI added a commit that referenced this pull request May 16, 2026
… post-PR #2685 ns normalization

Agent-Logs-Url: https://github.com/GaijinEntertainment/daScript/sessions/97224dec-45d1-4968-a3dd-8e5f37274983

Co-authored-by: borisbat <272689+borisbat@users.noreply.github.com>
pull Bot pushed a commit to forksnd/daScript that referenced this pull request May 17, 2026
PR GaijinEntertainment#2685 normalized ref_time_ticks() to nanoseconds across every
platform (Windows used to return raw QPC ticks at the underlying
counter's frequency — typically 10 MHz). The fix shipped without a
unit test that would have caught a units regression.

Add four tests under tests/fio/perf_time.das (sleep() lives in fio,
so this is the right neighborhood):

  - monotonic — 1000 successive reads never go backwards. Catches
    any signed/unsigned mixup or wrap-around bug in the ns conversion
    arithmetic.

  - sleep_roundtrip — sleep(100 ms) -> delta_ns must land in
    [80 ms, 500 ms]. The 80 ms lower bound is the load-bearing
    assertion: if Windows reverted to raw QPC ticks (10 MHz counter
    on the typical box -> a 100 ms wall-clock sleep would surface as
    1000000 "ticks" interpreted as ns, i.e. 1 ms), the test would
    trip. Wide upper bound covers CI runner scheduler jitter.

  - get_time_usec_agrees — the get_time_usec(t0) helper agrees with
    (ref_time_ticks() - t0) / 1000 within 5 ms. Two helpers reading
    the same underlying clock should not drift; if one ever ends up
    on a different code path, this notices.

  - units_are_nanoseconds — three back-to-back sleep(100 ms) deltas
    stay within 200 ms spread. If the unit accidentally changed
    mid-run (think: thread-local frequency cache going stale), the
    deltas would diverge wildly.

The test runs cleanly in both interpreter and AOT mode on Windows
(Win11 local): sleep(100 ms) -> 102-109 ms delta, get_time_usec
agrees to within microseconds. tests/aot/CMakeLists.txt:224 already
covers tests/fio/*.das via FILE(GLOB CONFIGURE_DEPENDS); cmake
reconfigure picks the new file up automatically.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
pull Bot pushed a commit to forksnd/daScript that referenced this pull request May 17, 2026
…ssion

Cards added in the course of the linq_fold splice rewrite + PR GaijinEntertainment#2691
(has_sideeffects + counter-lane elision). Topics:

linq_fold / macro-emission patterns:
- daslang-generic-instance-detect-via-fromgeneric — func.fromGeneric is
  the canonical "which generic was this instantiated from?" link;
  func.name on typed instances is mangled.
- daslib-macro-boost-has-sideeffects-predicate — new public predicate,
  full classification table, known limitations, test plumbing.
- qmacro-invoke-source-bind-typedecl-modifier-iter-vs-array — typedecl
  block-param const/ref handling differs between iterator and array
  sources; the two diagnostic error messages tell you which branch you
  picked wrong.
- qmacro-gensym-per-callsite-via-lineinfo — backtick-prefixed names +
  line+column suffix, force_at / force_generated / can_shadow.
- my-fold-macro-emits-a-loop-with-for-it-in-source-... (UPDATED) —
  peel_each pattern corrected for generic-instance detection + positive
  array gate + block-param typedecl handling.

LINQ semantics:
- are-there-parity-tests-in-tests-linq-that-compare-fold-output-to-...
- which-typedecl-predicates-identify-types-where-length-expr-is-...
- why-does-each-arr-fail-with-unsafe-when-not-source-of-for-loop-...
- what-s-the-right-sqlite-linq-chain-form-for-aggregates-sum-min-max-...
- my-macro-substitutes-it-for-a-projection-expression-via-template-...
- when-a-call-macro-needs-to-pick-copy-vs-move-init-for-a-projection-...
- where-does-nolint-rule-go-when-a-lint-warning-is-emitted-from-inside-...

Tooling / ops:
- how-do-i-run-dastest-in-benchmark-only-mode-and-what-s-the-command-...
- cpp-profiler-macos-samply-instruments.md
- what-s-the-end-to-end-checklist-for-adding-a-new-daslib-das-module-...
- how-do-i-call-a-dasimgui-or-any-managed-c-method-on-a-struct-field-...

Updated:
- why-does-my-dastest-integration-test-hang-at-readiness-gate-failed-...
  — original card pointed at a require-order red herring; real cause
  was ref_time_ticks() returning ns on POSIX while wait_until_ready's
  deadline math assumed μs. Fix landed in PR GaijinEntertainment#2685.

No code changes — docs only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants